{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'>1. Description<font>\n",
    "\n",
    "Click though rate prediction using logistic regression. Please download the data from https://www.kaggle.com/c/avazu-ctr-prediction/ manually (registration required) and place `CTR_train` file in `datasets` directory. Only part of the data is used, because it takes too large memory for one VE card.\n",
    "\n",
    "In online advertising, click-through rate (CTR) is a very important metric for evaluating ad performance. As a result, click prediction systems are essential and widely used for sponsored search and real-time bidding. This data is 11 days worth of Avazu data to build and test prediction models."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'>2. Data Preprocessing<font>\n",
    "\n",
    "For CTR classification we will perform some data preparation and data cleaning steps. We will generate feature vectors using sklearn TF-IDF for review text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import time\n",
    "import pandas as pd\n",
    "from collections import OrderedDict\n",
    "from sklearn import metrics\n",
    "from sklearn.preprocessing import OneHotEncoder\n",
    "from sklearn.model_selection import train_test_split\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def preprocess_data(fname):\n",
    "    '''\n",
    "    For CTR classification we will perform some data preparation and data cleaning steps.\n",
    "    '''\n",
    "    #Sample size of 1 million has been taken for execution from CTR dataset\n",
    "    n_rows =  800000\n",
    "    df = pd.read_csv(fname, nrows=n_rows)\n",
    "    class_names = {0:'Not click', 1:'Click'}\n",
    "    print(df.click.value_counts().rename(index = class_names))\n",
    "    # We dropped 'click', 'id', 'hour', 'device_id', 'device_ip' from the dataset,\n",
    "    # which does not contribute the prediction.\n",
    "    x = df.drop(['click', 'id', 'hour', 'device_id', 'device_ip'], axis=1).values\n",
    "    y = df['click'].values\n",
    "    n_rows = df.shape[0]\n",
    "    x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.05)\n",
    "    # Other features are one-hot encoded; so the feature matrix becomes sparse matrix\n",
    "    enc = OneHotEncoder(handle_unknown='ignore')\n",
    "    x_train_enc = enc.fit_transform(x_train)\n",
    "    x_test_enc = enc.transform(x_test)\n",
    "    return x_train_enc, x_test_enc, y_train, y_test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Not click    669744\n",
      "Click        130256\n",
      "Name: click, dtype: int64\n",
      "shape of train data: (760000, 11452)\n",
      "shape of test data: (40000, 11452)\n"
     ]
    }
   ],
   "source": [
    "#---- Data Preparation ----\n",
    "\n",
    "DATA_FILE = \"datasets/ctr_train.csv\"\n",
    "x_train, x_test, y_train, y_test = preprocess_data(DATA_FILE)\n",
    "print(\"shape of train data: {}\".format(x_train.shape))\n",
    "print(\"shape of test data: {}\".format(x_test.shape))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:ylabel='count'>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAD4CAYAAAAgs6s2AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAASR0lEQVR4nO3df4yd113n8fencQNZII3TDN5gZ3EEFsgU2iajxMBqtdtoHacLOEIlSgXYBKteqSkCsdol3T/WS7KVyv7qNmyxZBE3NmIJVqHEoLTGcgsIQdpMaEiahCqzodnYSurBdhOgaqt0v/xxj+F2cud6Es69E4/fL+nRfZ7vc85zzkhRPnp+OlWFJEk9vW6lJyBJWn0MF0lSd4aLJKk7w0WS1J3hIknqbs1KT+C14oorrqiNGzeu9DQk6bzy8MMP/1VVzSyuGy7Nxo0bmZubW+lpSNJ5Jckzo+peFpMkdWe4SJK6M1wkSd0ZLpKk7gwXSVJ3hoskqTvDRZLUneEiSerOcJEkdecb+h1d++8PrvQU9Brz8H/bsdJTkFaEZy6SpO4MF0lSd4aLJKk7w0WS1J3hIknqznCRJHVnuEiSuptouCS5LMlHkvxFkieTfH+Sy5McTfJU+13b2ibJ3Unmkzya5Jqh4+xs7Z9KsnOofm2Sx1qfu5Ok1UeOIUmajkmfuXwQ+HhVfTfwZuBJ4A7gWFVtAo61bYCbgE1t2Q3shUFQAHuA64HrgD1DYbEXeNdQv22tvtQYkqQpmFi4JHkD8C+AewCq6qtV9UVgO3CgNTsA3NzWtwMHa+BB4LIkVwI3Aker6nRVnQGOAtvavkur6sGqKuDgomONGkOSNAWTPHO5GlgAPpzkM0l+Nck3Aeuq6rnW5nlgXVtfDzw71P94q42rHx9RZ8wYkqQpmGS4rAGuAfZW1VuBv2XR5al2xlETnMPYMZLsTjKXZG5hYWGS05CkC8okw+U4cLyqPtW2P8IgbL7QLmnRfk+2/SeAq4b6b2i1cfUNI+qMGePrVNW+qpqtqtmZmZlX9UdKkl5uYuFSVc8Dzyb5rla6AXgCOAycfeJrJ3B/Wz8M7GhPjW0BXmiXto4AW5OsbTfytwJH2r4Xk2xpT4ntWHSsUWNIkqZg0p/c/xng15NcDDwN3MYg0A4l2QU8A9zS2j4AvB2YB77U2lJVp5PcBTzU2t1ZVafb+ruBe4FLgI+1BeD9S4whSZqCiYZLVT0CzI7YdcOItgXcvsRx9gP7R9TngDeNqJ8aNYYkaTp8Q1+S1J3hIknqznCRJHVnuEiSujNcJEndGS6SpO4MF0lSd4aLJKk7w0WS1J3hIknqznCRJHVnuEiSujNcJEndGS6SpO4MF0lSd4aLJKk7w0WS1J3hIknqznCRJHVnuEiSujNcJEndGS6SpO4MF0lSd4aLJKm7iYZLks8neSzJI0nmWu3yJEeTPNV+17Z6ktydZD7Jo0muGTrOztb+qSQ7h+rXtuPPt74ZN4YkaTqmcebyr6rqLVU127bvAI5V1SbgWNsGuAnY1JbdwF4YBAWwB7geuA7YMxQWe4F3DfXbdo4xJElTsBKXxbYDB9r6AeDmofrBGngQuCzJlcCNwNGqOl1VZ4CjwLa279KqerCqCji46FijxpAkTcGkw6WA30/ycJLdrbauqp5r688D69r6euDZob7HW21c/fiI+rgxvk6S3UnmkswtLCy84j9OkjTamgkf/59X1Ykk3wocTfIXwzurqpLUJCcwboyq2gfsA5idnZ3oPCTpQjLRM5eqOtF+TwIfZXDP5Avtkhbt92RrfgK4aqj7hlYbV98wos6YMSRJUzCxcEnyTUm+5ew6sBX4LHAYOPvE107g/rZ+GNjRnhrbArzQLm0dAbYmWdtu5G8FjrR9LybZ0p4S27HoWKPGkCRNwSQvi60DPtqeDl4D/J+q+niSh4BDSXYBzwC3tPYPAG8H5oEvAbcBVNXpJHcBD7V2d1bV6bb+buBe4BLgY20BeP8SY0iSpmBi4VJVTwNvHlE/Bdwwol7A7Uscaz+wf0R9DnjTcseQJE2Hb+hLkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSd4SJJ6m7i4ZLkoiSfSfJ7bfvqJJ9KMp/kN5Nc3Orf0Lbn2/6NQ8d4b6t/LsmNQ/VtrTaf5I6h+sgxJEnTMY0zl58Fnhza/iXgA1X1ncAZYFer7wLOtPoHWjuSbAZuBb4H2Ab8Sgusi4APATcBm4F3trbjxpAkTcFEwyXJBuDfAL/atgO8DfhIa3IAuLmtb2/btP03tPbbgfuq6itV9ZfAPHBdW+ar6umq+ipwH7D9HGNIkqZg0mcu/wv4D8D/b9tvBL5YVS+17ePA+ra+HngWoO1/obX/+/qiPkvVx40hSZqCiYVLkh8CTlbVw5Ma4x8rye4kc0nmFhYWVno6krRqTPLM5QeBH0nyeQaXrN4GfBC4LMma1mYDcKKtnwCuAmj73wCcGq4v6rNU/dSYMb5OVe2rqtmqmp2ZmXn1f6kk6etMLFyq6r1VtaGqNjK4If+Jqvpx4JPAO1qzncD9bf1w26bt/0RVVavf2p4muxrYBHwaeAjY1J4Mu7iNcbj1WWoMSdIUrMR7Lr8A/HySeQb3R+5p9XuAN7b6zwN3AFTV48Ah4Ang48DtVfW1dk/lPcARBk+jHWptx40hSZqCNedu8o9XVX8A/EFbf5rBk16L23wZ+LEl+r8PeN+I+gPAAyPqI8eQJE2Hb+hLkrozXCRJ3RkukqTuDBdJUneGiySpu2WFS5Jjy6lJkgTneBQ5yTcC/wS4IslaIG3Xpfi9LknSEs71nsu/BX4O+DbgYf4hXF4E/vfkpiVJOp+NDZeq+iDwwSQ/U1W/PKU5SZLOc8t6Q7+qfjnJDwAbh/tU1cEJzUuSdB5bVrgk+TXgO4BHgK+1cgGGiyTpZZb7bbFZYHP74rAkSWMt9z2XzwL/dJITkSStHss9c7kCeCLJp4GvnC1W1Y9MZFaSpPPacsPlP09yEpKk1WW5T4v94aQnIklaPZb7tNhfM3g6DOBi4PXA31bVpZOamCTp/LXcM5dvObueJMB2YMukJiVJOr+94q8i18DvADf2n44kaTVY7mWxHx3afB2D916+PJEZSZLOe8t9WuyHh9ZfAj7P4NKYJEkvs9x7LrdNeiKSpNVjuf9Y2IYkH01ysi2/lWTDpCcnSTo/LfeG/oeBwwz+XZdvA3631SRJepnlhstMVX24ql5qy73AzLgOSb4xyaeT/HmSx5P8YqtfneRTSeaT/GaSi1v9G9r2fNu/cehY7231zyW5cai+rdXmk9wxVB85hiRpOpYbLqeS/ESSi9ryE8Cpc/T5CvC2qnoz8BZgW5ItwC8BH6iq7wTOALta+13AmVb/QGtHks3ArcD3ANuAXzk7D+BDwE3AZuCdrS1jxpAkTcFyw+WngVuA54HngHcAPzWuQ3sf5m/a5uvbUsDbgI+0+gHg5ra+vW3T9t8w9MLmfVX1lar6S2AeuK4t81X1dFV9FbgP2N76LDWGJGkKlhsudwI7q2qmqr6VQdj84rk6tTOMR4CTwFHg/wJfrKqXWpPjwPq2vh54FqDtfwF443B9UZ+l6m8cM8bi+e1OMpdkbmFh4Vx/jiRpmZYbLt9XVWfOblTVaeCt5+pUVV+rqrcAGxicaXz3q5nkpFTVvqqararZmZmxt5AkSa/AcsPldUnWnt1IcjnLfwGTqvoi8Eng+4HLkpztuwE40dZPAFe1468B3sDgvs7f1xf1Wap+aswYkqQpWG64/A/gT5PcleQu4E+A/zquQ5KZJJe19UuAfw08ySBk3tGa7QTub+uH2zZt/yfaP6t8GLi1PU12NbAJ+DTwELCpPRl2MYOb/odbn6XGkCRNwXLf0D+YZI7BjXKAH62qJ87R7UrgQHuq63XAoar6vSRPAPcl+S/AZ4B7Wvt7gF9LMg+cZhAWVNXjSQ4BTzD49MztVfU1gCTvAY4AFwH7q+rxdqxfWGIMSdIUvJJLW08w+B/8cts/yoj7MlX1NIP7L4vrXwZ+bIljvQ9434j6A8ADyx1DkjQdr/iT+5IknYvhIknqznCRJHVnuEiSujNcJEndGS6SpO4MF0lSd4aLJKk7w0WS1J3hIknqznCRJHVnuEiSujNcJEndGS6SpO4MF0lSd4aLJKk7w0WS1J3hIknqznCRJHVnuEiSujNcJEndGS6SpO4MF0lSd4aLJKm7iYVLkquSfDLJE0keT/KzrX55kqNJnmq/a1s9Se5OMp/k0STXDB1rZ2v/VJKdQ/VrkzzW+tydJOPGkCRNxyTPXF4C/l1VbQa2ALcn2QzcARyrqk3AsbYNcBOwqS27gb0wCApgD3A9cB2wZygs9gLvGuq3rdWXGkOSNAUTC5eqeq6q/qyt/zXwJLAe2A4caM0OADe39e3AwRp4ELgsyZXAjcDRqjpdVWeAo8C2tu/Sqnqwqgo4uOhYo8aQJE3BVO65JNkIvBX4FLCuqp5ru54H1rX19cCzQ92Ot9q4+vERdcaMsXheu5PMJZlbWFh4FX+ZJGmUiYdLkm8Gfgv4uap6cXhfO+OoSY4/boyq2ldVs1U1OzMzM8lpSNIFZaLhkuT1DILl16vqt1v5C+2SFu33ZKufAK4a6r6h1cbVN4yojxtDkjQFk3xaLMA9wJNV9T+Hdh0Gzj7xtRO4f6i+oz01tgV4oV3aOgJsTbK23cjfChxp+15MsqWNtWPRsUaNIUmagjUTPPYPAj8JPJbkkVb7j8D7gUNJdgHPALe0fQ8AbwfmgS8BtwFU1ekkdwEPtXZ3VtXptv5u4F7gEuBjbWHMGJKkKZhYuFTVHwNZYvcNI9oXcPsSx9oP7B9RnwPeNKJ+atQYkqTp8A19SVJ3hoskqTvDRZLUneEiSerOcJEkdWe4SJK6M1wkSd0ZLpKk7gwXSVJ3hoskqTvDRZLUneEiSerOcJEkdWe4SJK6M1wkSd0ZLpKk7gwXSVJ3hoskqTvDRZLU3ZqVnoCkyft/d37vSk9Br0H/7D89NrFje+YiSerOcJEkdWe4SJK6m1i4JNmf5GSSzw7VLk9yNMlT7XdtqyfJ3Unmkzya5JqhPjtb+6eS7ByqX5vksdbn7iQZN4YkaXomeeZyL7BtUe0O4FhVbQKOtW2Am4BNbdkN7IVBUAB7gOuB64A9Q2GxF3jXUL9t5xhDkjQlEwuXqvoj4PSi8nbgQFs/ANw8VD9YAw8ClyW5ErgROFpVp6vqDHAU2Nb2XVpVD1ZVAQcXHWvUGJKkKZn2PZd1VfVcW38eWNfW1wPPDrU73mrj6sdH1MeN8TJJdieZSzK3sLDwKv4cSdIoK3ZDv51x1EqOUVX7qmq2qmZnZmYmORVJuqBMO1y+0C5p0X5PtvoJ4KqhdhtabVx9w4j6uDEkSVMy7XA5DJx94msncP9QfUd7amwL8EK7tHUE2JpkbbuRvxU40va9mGRLe0psx6JjjRpDkjQlE/v8S5LfAP4lcEWS4wye+no/cCjJLuAZ4JbW/AHg7cA88CXgNoCqOp3kLuCh1u7Oqjr7kMC7GTyRdgnwsbYwZgxJ0pRMLFyq6p1L7LphRNsCbl/iOPuB/SPqc8CbRtRPjRpDkjQ9vqEvSerOcJEkdWe4SJK6M1wkSd0ZLpKk7gwXSVJ3hoskqTvDRZLUneEiSerOcJEkdWe4SJK6M1wkSd0ZLpKk7gwXSVJ3hoskqTvDRZLUneEiSerOcJEkdWe4SJK6M1wkSd0ZLpKk7gwXSVJ3hoskqTvDRZLU3aoNlyTbknwuyXySO1Z6PpJ0IVmV4ZLkIuBDwE3AZuCdSTav7Kwk6cKxKsMFuA6Yr6qnq+qrwH3A9hWekyRdMNas9AQmZD3w7ND2ceD6xY2S7AZ2t82/SfK5KcztQnEF8FcrPYmVlv++c6WnoJfzv82z9qTHUb59VHG1hsuyVNU+YN9Kz2M1SjJXVbMrPQ9pMf/bnI7VelnsBHDV0PaGVpMkTcFqDZeHgE1Jrk5yMXArcHiF5yRJF4xVeVmsql5K8h7gCHARsL+qHl/haV1ovNyo1yr/25yCVNVKz0GStMqs1stikqQVZLhIkrozXNSVn93Ra1WS/UlOJvnsSs/lQmC4qBs/u6PXuHuBbSs9iQuF4aKe/OyOXrOq6o+A0ys9jwuF4aKeRn12Z/0KzUXSCjJcJEndGS7qyc/uSAIMF/XlZ3ckAYaLOqqql4Czn915EjjkZ3f0WpHkN4A/Bb4ryfEku1Z6TquZn3+RJHXnmYskqTvDRZLUneEiSerOcJEkdWe4SJK6M1wkSd0ZLpKk7v4OSzO/ixXxzOgAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.countplot(y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'>3. Algorithm Evaluation<font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_time = []\n",
    "test_time = []\n",
    "accuracy = []\n",
    "precision = []\n",
    "recall = []\n",
    "f1 = []\n",
    "estimator_name = []"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def evaluate(estimator, estimator_nm, \n",
    "             x_train, y_train,\n",
    "             x_test, y_test):\n",
    "    '''\n",
    "    To generate performance report for both frovedis and sklearn estimators\n",
    "    '''\n",
    "    estimator_name.append(estimator_nm)\n",
    "    start_time = time.time()\n",
    "    estimator.fit(x_train, y_train)\n",
    "    train_time.append(round(time.time() - start_time, 4))\n",
    "\n",
    "    start_time = time.time()\n",
    "    pred_y = estimator.predict(x_test)\n",
    "    test_time.append(round(time.time() - start_time, 4))\n",
    "    accuracy.append(metrics.accuracy_score(y_test, pred_y))\n",
    "    precision.append(metrics.precision_score(y_test, pred_y))\n",
    "    recall.append(metrics.recall_score(y_test, pred_y))\n",
    "    f1.append(metrics.f1_score(y_test, pred_y))\n",
    "    return metrics.classification_report(y_test, pred_y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.1 Binary LogisticRegression with sag solver"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Frovedis LogisticRegression matrices: \n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.86      0.97      0.91     33473\n",
      "           1       0.52      0.16      0.25      6527\n",
      "\n",
      "    accuracy                           0.84     40000\n",
      "   macro avg       0.69      0.57      0.58     40000\n",
      "weighted avg       0.80      0.84      0.80     40000\n",
      "\n",
      "Sklearn LogisticRegression matrices: \n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.85      0.98      0.91     33473\n",
      "           1       0.52      0.11      0.18      6527\n",
      "\n",
      "    accuracy                           0.84     40000\n",
      "   macro avg       0.69      0.55      0.55     40000\n",
      "weighted avg       0.80      0.84      0.79     40000\n",
      "\n"
     ]
    }
   ],
   "source": [
    "#Demo: Binary Logistic Regression with sag solver\n",
    "\n",
    "import frovedis\n",
    "TARGET = \"binary_logistic_regression_sag\"\n",
    "from frovedis.exrpc.server import FrovedisServer\n",
    "FrovedisServer.initialize(\"mpirun -np 8 \" + os.environ[\"FROVEDIS_SERVER\"])\n",
    "from frovedis.mllib.linear_model import LogisticRegression as frovLogisticRegression\n",
    "f_est = frovLogisticRegression(penalty='l2', solver='sag')\n",
    "E_NM = TARGET + \"_frovedis_\" + frovedis.__version__\n",
    "f_report = evaluate(f_est, E_NM, \\\n",
    "                    x_train, y_train, x_test, y_test)\n",
    "f_est.release()\n",
    "FrovedisServer.shut_down()\n",
    "\n",
    "import sklearn\n",
    "from sklearn.linear_model import LogisticRegression as skLogisticRegression\n",
    "s_est = skLogisticRegression(penalty='l2', solver='sag')\n",
    "E_NM = TARGET + \"_sklearn_\" + sklearn.__version__\n",
    "s_report = evaluate(s_est, E_NM, \\\n",
    "                    x_train, y_train, x_test, y_test)\n",
    "\n",
    "print(\"Frovedis LogisticRegression matrices: \")\n",
    "print(f_report)\n",
    "print(\"Sklearn LogisticRegression matrices: \")\n",
    "print(s_report)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.2 Linear SVC"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Frovedis Linear SVC metrices: \n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.84      0.99      0.91     33473\n",
      "           1       0.57      0.04      0.08      6527\n",
      "\n",
      "    accuracy                           0.84     40000\n",
      "   macro avg       0.70      0.52      0.50     40000\n",
      "weighted avg       0.80      0.84      0.78     40000\n",
      "\n",
      "Sklearn Linear SVC metrices: \n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.85      0.98      0.91     33473\n",
      "           1       0.53      0.12      0.19      6527\n",
      "\n",
      "    accuracy                           0.84     40000\n",
      "   macro avg       0.69      0.55      0.55     40000\n",
      "weighted avg       0.80      0.84      0.79     40000\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/adityaw/virt1/lib64/python3.6/site-packages/sklearn/svm/_base.py:986: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.\n",
      "  \"the number of iterations.\", ConvergenceWarning)\n"
     ]
    }
   ],
   "source": [
    "# Demo: Linear SVC\n",
    "\n",
    "import frovedis\n",
    "TARGET = \"Linear_SVC\"\n",
    "from frovedis.exrpc.server import FrovedisServer\n",
    "FrovedisServer.initialize(\"mpirun -np 8 \" + os.environ[\"FROVEDIS_SERVER\"])\n",
    "from frovedis.mllib.svm import LinearSVC as frovSVC\n",
    "\n",
    "f_est = frovSVC(loss='hinge', max_iter=10000)\n",
    "E_NM = TARGET + \"_frovedis_\" + frovedis.__version__\n",
    "f_report = evaluate(f_est, E_NM, \\\n",
    "                    x_train, y_train, x_test, y_test)\n",
    "f_est.release()\n",
    "FrovedisServer.shut_down()\n",
    "\n",
    "import sklearn\n",
    "from sklearn.svm import LinearSVC as skSVC\n",
    "s_est = skSVC(loss='hinge', max_iter=10000)\n",
    "E_NM = TARGET + \"_sklearn_\" + sklearn.__version__\n",
    "s_report = evaluate(s_est, E_NM, \\\n",
    "                    x_train, y_train, x_test, y_test)\n",
    "\n",
    "# SVC: Precision, Recall and F1 score for each class\n",
    "print(\"Frovedis Linear SVC metrices: \")\n",
    "print(f_report)\n",
    "print(\"Sklearn Linear SVC metrices: \")\n",
    "print(s_report)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Performance summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>estimator</th>\n",
       "      <th>train time</th>\n",
       "      <th>test time</th>\n",
       "      <th>accuracy</th>\n",
       "      <th>precision</th>\n",
       "      <th>recall</th>\n",
       "      <th>f1-score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>binary_logistic_regression_sag_frovedis_0.9.10</td>\n",
       "      <td>1.7105</td>\n",
       "      <td>0.0586</td>\n",
       "      <td>0.838925</td>\n",
       "      <td>0.520388</td>\n",
       "      <td>0.164241</td>\n",
       "      <td>0.249680</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>binary_logistic_regression_sag_sklearn_0.24.1</td>\n",
       "      <td>18.8960</td>\n",
       "      <td>0.0287</td>\n",
       "      <td>0.838425</td>\n",
       "      <td>0.523358</td>\n",
       "      <td>0.109851</td>\n",
       "      <td>0.181588</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Linear_SVC_frovedis_0.9.10</td>\n",
       "      <td>13.6409</td>\n",
       "      <td>0.0616</td>\n",
       "      <td>0.838475</td>\n",
       "      <td>0.566000</td>\n",
       "      <td>0.043358</td>\n",
       "      <td>0.080546</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Linear_SVC_sklearn_0.24.1</td>\n",
       "      <td>1941.1949</td>\n",
       "      <td>0.0287</td>\n",
       "      <td>0.838850</td>\n",
       "      <td>0.527759</td>\n",
       "      <td>0.117972</td>\n",
       "      <td>0.192837</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        estimator  train time  test time  \\\n",
       "0  binary_logistic_regression_sag_frovedis_0.9.10      1.7105     0.0586   \n",
       "1   binary_logistic_regression_sag_sklearn_0.24.1     18.8960     0.0287   \n",
       "2                      Linear_SVC_frovedis_0.9.10     13.6409     0.0616   \n",
       "3                       Linear_SVC_sklearn_0.24.1   1941.1949     0.0287   \n",
       "\n",
       "   accuracy  precision    recall  f1-score  \n",
       "0  0.838925   0.520388  0.164241  0.249680  \n",
       "1  0.838425   0.523358  0.109851  0.181588  \n",
       "2  0.838475   0.566000  0.043358  0.080546  \n",
       "3  0.838850   0.527759  0.117972  0.192837  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# ---- evaluation summary ----\n",
    "summary = pd.DataFrame(OrderedDict({ \"estimator\": estimator_name,\n",
    "                                     \"train time\": train_time,\n",
    "                                     \"test time\": test_time,\n",
    "                                     \"accuracy\": accuracy,\n",
    "                                     \"precision\": precision,\n",
    "                                     \"recall\": recall,\n",
    "                                     \"f1-score\": f1\n",
    "                                  }))\n",
    "summary"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
